Augmenting Lexicons Automatically: Clustering Semantically Related Adjectives

نویسندگان

  • Kathleen McKeown
  • Vasileios Hatzivassiloglou
چکیده

Our work focuses on identifying various types of lexical data in large corpora through statistical analysis. In this paper, we present a method for grouping adjectives according to their meaning, as a step towards the automatic identification of adjectival scales. We describe how our system exploits two sources of linguistic knowledge in a corpus to compute a measure of similarity between two adjectives, using statistical techniques and a clustering algorithm for grouping. We evaluate the significance of the results produced by our system for a sample set of adjecfives. 1. I N T R O D U C T I O N A linguistic scale is a set of words, of the same grammatical category, which can be ordered by their semantic strength or degree of informativeness [1]. For example, "lukewarm," "wa rm " , "ho t " fall along a single adjectival scale since they indicate a variation in the intensity of temperature of the modified noun. Linguistic properties of scales derive both from conventional logical entailment on the linear ordering of their elements and from Gricean scalar implicature [1]. Despite these properties and their potential usefulness in both understanding and generating natural language text, dictionary entries are largely incomplete for adjectives in this regard. Yet, if systems are to use the information encoded in adjectival scales for generation or interpretation (e.g. for selecting an adjective with a particular degree of semantic strength, or for handling negation), they must have access to the sets of words comprising a scale. While linguists have presented various tests for accepting or rejecting a particular scalar relationship between any two adjectives (e.g., [2], [3]), the common problem with these methods is that they are designed to be applied by a human who incorporates the two adjectives in specific sentential frames (e.g. " X is warm, even hot") and assesses the semantic validity of the resulting sentences. Such tests cannot be used computationally to identify scales in a domain, since the specific sentences do not occur frequently enough in a corpus to produce an adequate description of the adjectival scales in the domain [4]. As scales vary across domains, the task of compiling such information is compounded. In this paper we describe a technique for automatically grouping adjectives according to their meaning based on a given text corpus, so that all adjectives placed in one group describe different values of the same property. Our method is based on statistical techniques, augmented with linguistic information derived from the corpus, and is completely domain independent. It demonstrates how high-level semantic knowledge can be computed from large amounts of low-level knowledge (essentially plain text, part-ofspeech rules, and optionally syntactic relations). While our current system does not distinguish between scalar and non-scalar adjectives, it is a first step in the automatic identification of adjectival scales, since the scales can be subsequently ordered and the non-scalar adjectives filtered on the basis of independent tests, done in part automatically and in part by hand in a post-editing phase. The result is a semi-automated system for the compilation of adjectival scales. In the following sections, we first describe our algorithm in detail, present the results obtained, and finally provide a formal evaluation of the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantically Significant Patterns in Dictionary Definitions

Natural language processing systems need large lexicons containing explicit information about lexical-semantlc relationships, selection restrictions, and verb categories. Because the labor involved in constructing such lexicons by hand is overwhelming, we have been trying to construct lexical entries automatically from information available in the machine-readable version of Webst@r's ~@ve~h Co...

متن کامل

Automatic Extraction of Polar Adjectives for the Creation of Polarity Lexicons

Automatic creation of polarity lexicons is a crucial issue to be solved in order to reduce time and efforts in the first steps of Sentiment Analysis. In this paper we present a methodology based on linguistic cues that allows us to automatically discover, extract and label subjective adjectives that should be collected in a domain-based polarity lexicon. For this purpose, we designed a bootstra...

متن کامل

Discovering Word Senses for Polysemous Words Using Feature Domain Similarity

This paper presents a new clustering algorithm called DSCBC which is designed to automatically discover word senses for polysemous words. DSCBC is an extension of CBC (Pantel and Lin, 2002), and incorporates feature domain similarity: the similarity between the features themselves, obtained a priori from sources external to the dataset. By incorporating the feature domain similarity in clusteri...

متن کامل

Combining Syntactic Co-occurrences and Nearest Neighbours in Distributional Methods to Remedy Data Sparseness.

The task of automatically acquiring semantically related words have led people to study distributional similarity. The distributional hypothesis states that words that are similar share similar contexts. In this paper we present a technique that aims at improving the performance of a syntax-based distributional method by augmenting the original input of the system (syntactic co-occurrences) wit...

متن کامل

Expanding Opinion Lexicon with Domain Specific Opinion Words Using Semi-Supervised Approach

Opinion words as well as opinion phrases and idioms are very useful in sentiment analysis. All these terms together build opinion or sentiment lexicons. Therefore, opinion lexicons are large lists of terms that encode the sentiment of each phrase within it. Generally, to create such a lexicon automatically, high-precision classifiers use known sentiment vocabulary, e.g. the prior polarity of an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993